Search CORE

142 research outputs found

Artimate: an articulatory animation framework for audiovisual speech synthesis

Author: Ouni Slim
Steiner Ingmar
Publication venue
Publication date: 01/01/2012
Field of study

We present a modular framework for articulatory animation synthesis using speech motion capture data obtained with electromagnetic articulography (EMA). Adapting a skeletal animation approach, the articulatory motion data is applied to a three-dimensional (3D) model of the vocal tract, creating a portable resource that can be integrated in an audiovisual (AV) speech synthesis platform to provide realistic animation of the tongue and teeth for a virtual character. The framework also provides an interface to articulatory animation synthesis, as well as an example application to illustrate its use with a 3D game engine. We rely on cross-platform, open-source software and open standards to provide a lightweight, accessible, and portable workflow.Comment: Workshop on Innovation and Applications in Speech Technology (2012

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Introduction de contraintes pour l'inversion acoustico-articulatoire utilisant une table hypercubique

Author: Laprie Yves
Ouni Slim
Publication venue: HAL CCSD
Publication date: 01/01/2002
Field of study

Colloque avec actes et comité de lecture. nationale.National audienceOur acoustic to articulatory inversion method exploits an original codebook representing the articulatory space by hypercubes. The articulatory space is decomposed into regions where the articulatory-to-acoustic mapping is linear. Each region is represented by a hypercube. The inversion procedure retrieves articulatory vectors corresponding to an acoustic entry from the hypercube codebook. As the dimension of the articulatory space is greater than the dimension of the acoustic space, the corresponding null space is sampled by linear programming to retrieve all the possible solutions. A dynamic procedure is used to recover the best articulatory trajectory according to a minimum articulatory rate criterion. The addition of constraints allows the inversion process to be focused on realistic inverse articulatory trajectories

INRIA a CCSD electronic archive server

Mixing faces and voices: a study of the influence of faces and voices on audiovisual intelligibility

Author: Miranda Jérémy
Ouni Slim
Publication venue: HAL CCSD
Publication date: 29/08/2013
Field of study

International audienceThis study examined the influence of mixing faces and voices on the audiovisual intelligibility. The goal is to study the effect of combining two sources of information on the audiovisual intelligibility. Cross-talker dubbing was performed between faces and voices of 10 meaningful sentences pronounced by 10 talkers: 5 females and 5 males. Human subjects were asked to rate the articulation of the output videos. Comparisons were made between results of original and dubbed video. Almost across all the combinations, the audiovisual intelligibility was acceptable. The intelligibility of the speakers varied, however. We observed an influence of the audio/visual channel on the overall intelligibility that can increase or decrease depending the intelligibility results of this channel

INRIA a CCSD electronic archive server

Towards an articulatory tongue model using 3D EMA

Author: Ouni Slim
Steiner Ingmar
Publication venue: HAL CCSD
Publication date: 20/06/2011
Field of study

International audienceWithin the framework of an acoustic-visual (AV) speech synthesizer, we describe a preliminary tongue model that is both simple and flexible, and which is controlled by 3D electromagnetic articulography (EMA) data through an animation interface, providing realistic tongue movements for improved visual intelligibility. Data from a pilot study is discussed and deemed encouraging, and the integration of the tongue model into the AV synthesizer is outlined

INRIA a CCSD electronic archive server

Utilisation d'un dictionnaire hypercubique pour l'inversion acoustico-articulatoire

Author: Laprie Yves
Ouni Slim
Publication venue: HAL CCSD
Publication date: 01/01/2000
Field of study

Colloque avec actes et comité de lecture. nationale.National audienceDans cet article, nous présentons une méthode de construction d'un dictionnaire articulatoire qui donne une bonne couverture de l'espace articulatoire avec un nombre limité de points. Il s'agit d'une nouvelle représentation de l'espace articulatoire par des hypercubes. Pour chaque sommet d'un hypercube on connaît les paramètres articulatoires et les paramètres acoustiques (les formants). Nous présentons une méthode d'interpolation pour calculer avec une grande précision la trajectoire acoustique qui correspond à une trajectoire articulatoire, et la méthode d'inversion qui permet de récupérer les paramètres articulatoires à partir des formants. Le point fort de la méthode d'inversion utilisant ce dictionnaire est sa robustesse vis à vis les problèmes de non-linéarité de la relation articulatoire-acoustique

INRIA a CCSD electronic archive server

Predicting Tongue Positions from Acoustics and Facial Features

Author: Ouni Slim
Toutios Asterios
Publication venue: HAL CCSD
Publication date: 28/08/2011
Field of study

International audienceWe test the hypothesis that adding information regarding the positions of electromagnetic articulograph (EMA) sensors on the lips and jaw can improve the results of a typical acoustic-to-EMA mapping system, based on support vector regression, that targets the tongue sensors. Our initial motivation is to use such a system in the context of adding a tongue animation to a talking head built on the basis of concatenating bimodal acoustic-visual units. For completeness, we also train a system that maps only jaw and lip information to tongue information

INRIA a CCSD electronic archive server

Image processing device

Author: Gris Guillaume
Ouni Slim
Publication venue: HAL CCSD
Publication date: 01/03/2018
Field of study

INRIA a CCSD electronic archive server

An episodic memory-based solution for the acoustic-to-articulatory inversion problem

Author: Demange Sébastien
Ouni Slim
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 02/05/2013
Field of study

International audienceThis paper presents an acoustic-to-articulatory inversion method based on an episodic memory. An episodic memory is an interesting model for two reasons. First, it does not rely on any assumptions about the mapping function but rather it relies on real synchronized acoustic and articulatory data streams. Second, the memory inherently represents the real articulatory dynamics as observed. It is argued that the computational models of episodic memory, as they are usually designed, cannot provide a satisfying solution for the acoustic-to-articulatory inversion problem due to the insufficient quantity of training data. Therefore, an episodic memory is proposed, called generative episodic memory (G-Mem), which is able to produce articulatory trajectories that do not belong to the set of episodes the memory is based on. The generative episodic memory is evaluated using two electromagnetic articulography corpora: one for English and one for French. Comparisons with a codebook-based method and with a classical episodic memory (which is termed concatenative episodic memory) are presented in order to evaluate the proposed generative episodic memory in terms of both its modeling of articulatory dynamics and its generalization capabilities. The results show the effectiveness of the method where an overall root-mean-square error of 1.65 mm and a correlation of 0.71 are obtained for the G-Mem method. They are comparable to those of methods recently proposed

Crossref

INRIA a CCSD electronic archive server

Continuous episodic memory based speech recognition using articulatory dynamics

Author: Demange Sébastien
Ouni Slim
Publication venue: HAL CCSD
Publication date: 28/08/2011
Field of study

International audienceIn this paper we present a speech recognition system based on articulatory dynamics. We do not extend the acoustic feature with any explicit articulatory measurements but instead the ar- ticulatory dynamics of speech are structurally embodied within episodic memories. The proposed recognizer is made of differ- ent memories each specialized for a particular articulator. As all the articulators do not contribute equally to the realization of a particular phoneme, the specialized memories do not per- form equally regarding each phoneme. We show, through phone string recognition experiments that combining the recognition hypotheses resulting from the different articulatory specialized memories leads to significant recognition improvements

INRIA a CCSD electronic archive server

A study of the French Vowels Through The Main Constriction of the Vocal Tract Using an Acoustic-to-articulatory inversion method

Author: Laprie Yves
Ouni Slim
Publication venue: HAL CCSD
Publication date: 01/01/2003
Field of study

Colloque avec actes et comité de lecture. internationale.International audienceThis paper presents a study of the articulatory properties of French vowels using an acoustic-to-articulatory inversion method. The advantage of such an approach is that all the possible articulatory configurations can be studied independently of any articulatory preferences linked with a given speaker. Furthermore, it bypasses the issue of acquiring a vast amount of articulatory data by medical imaging techniques. The inversion method exploits an articulatory codebook, the acoustic precision of which is constant whatever the articulatory reion considered. Since the inversion is performed from the first three formants of vowels to recover the seven parameters of Maeda's model the null space of the articulatory to acoustic mapping is explored to recover all the possible articulatory shapes. Applied to French vowels this method allows the different places of articulation to be determined

INRIA a CCSD electronic archive server